自我监督的跨模式超分辨率(SR)可以克服获得配对训练数据的困难,但由于只有低分辨率(LR)源和高分辨率源(HR)指导图像,因此具有挑战性。现有方法利用伪或LR空间中的弱监督,因此提供了模糊或不忠于源方式的结果。为了解决这个问题,我们提出了一个相互调制的SR(MMSR)模型,该模型通过相互调制策略来解决任务,包括源至指南调制和指南对源调制。在这些调制中,我们开发了跨域自适应过滤器,以完全利用跨模式的空间依赖性,并有助于诱导源以模拟指南的分辨率并诱导指南模仿源的模态特征。此外,我们采用周期一致性约束,以完全自我监督的方式训练MMSR。各种任务的实验证明了我们的MMSR的最新性能。
translated by 谷歌翻译
近年来,在光场(LF)图像超分辨率(SR)中,深度神经网络(DNN)的巨大进展。但是,现有的基于DNN的LF图像SR方法是在单个固定降解(例如,双学的下采样)上开发的,因此不能应用于具有不同降解的超级溶解实际LF图像。在本文中,我们提出了第一种处理具有多个降解的LF图像SR的方法。在我们的方法中,开发了一个实用的LF降解模型,以近似于真实LF图像的降解过程。然后,降解自适应网络(LF-DANET)旨在将降解之前纳入SR过程。通过对具有多种合成降解的LF图像进行训练,我们的方法可以学会适应不同的降解,同时结合了空间和角度信息。对合成降解和现实世界LFS的广泛实验证明了我们方法的有效性。与现有的最新单一和LF图像SR方法相比,我们的方法在广泛的降解范围内实现了出色的SR性能,并且可以更好地推广到真实的LF图像。代码和模型可在https://github.com/yingqianwang/lf-danet上找到。
translated by 谷歌翻译
光场(LF)摄像机记录了光线的强度和方向,并将3D场景编码为4D LF图像。最近,为各种LF图像处理任务提出了许多卷积神经网络(CNN)。但是,CNN有效地处理LF图像是一项挑战,因为空间和角度信息与不同的差异高度缠绕。在本文中,我们提出了一种通用机制,以将这些耦合信息解开以进行LF图像处理。具体而言,我们首先设计了一类特定领域的卷积,以将LFS与不同的维度解开,然后通过设计特定于任务的模块来利用这些分离的功能。我们的解开机制可以在事先之前很好地纳入LF结构,并有效处理4D LF数据。基于提出的机制,我们开发了三个网络(即distgssr,distgasr和Distgdisp),用于空间超分辨率,角度超分辨率和差异估计。实验结果表明,我们的网络在所有这三个任务上都实现了最先进的性能,这表明了我们解散机制的有效性,效率和一般性。项目页面:https://yingqianwang.github.io/distglf/。
translated by 谷歌翻译
红外小目标超分辨率(SR)旨在从其低分辨率对应物中恢复具有高度控制目标的可靠和详细的高分辨率图像。由于红外小目标缺乏颜色和精细结构信息,因此利用序列图像之间的补充信息来提高目标是很重要的。在本文中,我们提出了名为局部运动和对比的第一红外小目标SR方法,以前驱动的深网络(MoCopnet)将红外小目标的域知识集成到深网络中,这可以减轻红外小目标的内在特征稀缺性。具体而言,通过在时空维度之前的局部运动的动机,我们提出了局部时空注意力模块,以执行隐式帧对齐并结合本地时空信息以增强局部特征(特别是对于小目标)来增强局部特征。通过在空间尺寸之前的局部对比的动机,我们提出了一种中心差异残留物,将中心差卷积纳入特征提取骨架,这可以实现以中心为导向的梯度感知特征提取,以进一步提高目标对比度。广泛的实验表明,我们的方法可以恢复准确的空间依赖性并改善目标对比度。比较结果表明,MoCopnet在SR性能和目标增强方面可以优于最先进的视频SR和单图像SR方法。基于SR结果,我们进一步调查了SR对红外小型目标检测的影响,实验结果表明MoCopnet促进了检测性能。代码可在https://github.com/xinyiying/mocopnet上获得。
translated by 谷歌翻译
单帧红外小目标(SIRST)检测旨在将小目标与混乱背景区分开。随着深度学习的发展,基于CNN的方法由于其强大的建模能力而在通用对象检测中产生了有希望的结果。但是,现有的基于CNN的方法不能直接应用于红外小目标,因为其网络中的汇总层可能导致深层中的目标损失。为了解决这个问题,我们在本文中提出了一个密集的嵌套注意网络(DNANET)。具体而言,我们设计了一个密集的嵌套交互模块(DNIM),以实现高级和低级特征之间的渐进互动。随着DNIM中的重复相互作用,可以保持深层中的红外小目标。基于DNIM,我们进一步提出了一个级联的通道和空间注意模块(CSAM),以适应增强多级特征。借助我们的DNANET,可以通过重复的融合和增强来充分整合和充分利用小型目标的上下文信息。此外,我们开发了一个红外的小目标数据集(即nudt-sirst),并提出了一组评估指标来进行全面的绩效评估。对公众和我们自我开发的数据集进行的实验证明了我们方法的有效性。与其他最先进的方法相比,我们的方法在检测概率(PD),假警报率(FA)和联合交集(IOU)方面取得了更好的性能。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译